Word similarity using constructions as contextual features
نویسندگان
چکیده
1 We propose and implement an alternative source of contextual features for word similarity detection based on the notion of lexicogrammatical construction. On the assumption that selectional restrictions provide indicators of the semantic similarity of words attested in selected positions, we extend the notion of selection beyond that of single selecting heads to multiword constructions exerting selectional preferences. Our model of 92 million cross-indexed hybrid n-grams (serving as our machine-tractable proxy for constructions) extracted from BNC provides the source of contextual features. We compare results with those of a grammatical dependency approach (Lin 1998), testing both against WordNetbased similarity rankings (Lin 1998; Resnik 1995). Averaged over the entire set of target nouns and 10-best candidate similar words, Lin’s approach gives overall similarity results closer to WordNet rankings than the constructional approach does, while the constructional approach overtakes Lin’s in approximating WordNet similarity for target nouns with a frequency over 3000. While this suggests feature sparseness for constructions that resolves with higher frequency nouns, constructions as shared contextual features render a much higher yield in similarity performance in approximating WordNet similarity than grammatical relations do. We examine some cases in detail showing the sorts of similarity detected by a constructional approach that are undetected by a grammatical relations approach or by WordNet or both and thus overlooked in benchmark eval-
منابع مشابه
Second-Order Word Embeddings from Nearest Neighbor Topological Features
We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recogni...
متن کاملSemantic Similarity Measure Using Relational and Latent Topic Features
Computing the semantic similarity between words is one of the key challenges in many language-based applications. Previous work tends to use the contextual information of words to disclose the degree of their similarity. In this paper, we consider the relationships between words in local contexts as well as latent topic information of words to propose a new distributed representation of words f...
متن کاملPutting Similarity Assessments into Context: Matching Functions with the User's Intended Operations
This paper presents a practical application of context for the evaluation of semantic similarity. The work is based on a new model for the assessment of semantic similarity among entity classes that satisfies cognitive properties of similarity and integrates contextual information. The semantic similarity model represents entity classes by their semantic relations (is-a and part-whole) and thei...
متن کاملClassifying Domain-Specific Terms Using a Dictionary
Automatically building domain-specific ontologies is a highly challenging task as it requires extracting domain-specific terms from a corpus and assigning them relevant domain concept labels. In this paper, we focus on the second task: i.e., assigning domain concepts to domain-specific terms. Motivated by previous approaches in related research (such as word sense disambiguation (WSD) and named...
متن کاملDiscovering Light Verb Constructions and their Translations from Parallel Corpora without Word Alignment
We propose a method for joint unsupervised discovery of multiword expressions (MWEs) and their translations from parallel corpora. First, we apply independent monolingual MWE extraction in source and target languages simultaneously. Then, we calculate translation probability, association score and distributional similarity of co-occurring pairs. Finally, we rank all translations of a given MWE ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013